This jupyter notebook walks you through how to analyze BLI data generate using the "single tip method", where a single tip is used repeatedly to measure binding across a range of titrant (species not on the tip) concentrations.
To complete the analysis, run all cells in order, filling in or changing input values as needed. To export a plot made using pyplot, add the plt.savefig('filename.jpg',dpi=200) to the cell in which it was generated.
Requirements: This notebook requires Python3, and the following modules:
plotlypandasmatplotlibnumpylmfitBLI_tools.py must be in the same directory as this notebook.If you have anaconda, you probably already have these modules except for lmfit. Install using conda install -c conda-forge lmfit
On the BLI instrument, export raw data file. It will be saved as something like RawData0.xls. If there are multiple raw data files, you can easily merge them as long as all of the data have different "tip well IDs". These are the wells in which the tips are stored prior to the experiment (see sensorplate image)
You will then need to make a sample_key.csv file. I make the file in excel and the save it as a csv. The sample_key.csv file just defines the name of the species on the tip (tip) and name of the species being titrated (titrant) for each "tip well ID". TODO: explain this better
example of what sample_key.csv should look like:
| tip well | tip | titrant |
|---|---|---|
| A6 | pCare | INV |
| B6 | pCare | INV |
| C6 | pCare | INV |
| A7 | empty | INV |
| B7 | empty | INV |
| C7 | empty | INV |
| D8 | Lpd3 | EVH1 only |
| D9 | empty | EVH1 only |
| D10 | SHIP2 | EVH1 only |
I wrote this script with the intention of it being modular - meaning you can look at and plot your data in different ways and deal with complicated plate setups dynamically. So you can plot subsets of the data in stages, decide which samples to calculate binding curves for, process samples with different loading times and different methods, etc.
Therefore, for now, it requires you to examine the data and make decisions instead of just pressing go. If you have very reliable/normal looking data that always looks the same, it would be fairly easy to turn this into an easy to run script.
Basic pipeline: most of the functions for processing the data are in the file BLI_tools.py. I use pandas heavily so I use a dataframe for everything more or less
sample_key as a dataframe BLI_tools.import_raw_data()BLI_data = BLI_tools.BLI_data_df() object using the sample key and raw data as inputsBLI_data.pyplot_plot_samples() or BLI_data.plotly_plot_samples() at any point. I like to use BLI_data.plotly_plot_samples() and save and .html file of all the raw data at first so that I can look at it interactively later.BLI_data.set...() methods to select a subset of the data in different ways. There is a method for setting wells based on tip/titrant names, enumerating through well IDs, or setting the wells directly. (see below for more details)BLI_data.zero_cols(). It won't be relevant for the binding signal calculation howeverBLI_data.set_assay_times() to set experimental parameters used for calculating the binding signal.BLI_data.binding_signal_preview()BLI_data.generate_binding_curves2() function and further process that however you would like.
For the data here, I will be working with multiple datasets collected with the same plate, on the same date.
Imports
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
pd.options.plotting.backend = "plotly"
import plotly.graph_objects as go
import plotly.express as px
plt.style.use('custom_standard')
# plt.style.use('custom_small')
from lmfit import Model
import BLI_tools as b
%load_ext autoreload
%autoreload 2
The autoreload extension is already loaded. To reload it, use: %reload_ext autoreload
We have multiple datafiles, so I am just going to make a datafiles list and import/merge all of the datafiles into 1 dataframe.
datafiles = ['./Mena_INV_Pcare_Empty/RawData0.xls',
'./EVH1_only_LPD3_SHIP2/Experiment_1/RawData1.xls',
'./EVH1_only_LPD3_SHIP2/Experiment_1/RawData0.xls']
sample_key = pd.read_csv('./sample_key.csv')
dfs = []
for file in datafiles:
dfs.append(b.import_raw_data(file))
df = pd.concat(dfs, axis=1)
plot all the data and export as an html using plotly. you can open this file in a web browser later and look at the data interactively
data = b.BLI_data_df(df,sample_key)
fig = data.plotly_plot_samples()
fig.show()
fig.write_html('all_data.html')
you can set the sample wells to use using different methods:
data.set_wells_from_samplekey(tip='all', titrant='all')data.set_wells_directly(cols)['A1', 'B2', 'C10'])data.set_wells_by_enumerating(letters, numbers)>>> set_wells_by_enumerating('A', list(range(1, 7)))
['A1','A2','A3','A4','A5','A6']
>>> set_wells_by_enumerating('ABCD', [1,2,3])
['A1', 'A2', 'A3', 'B1', 'B2', 'B3', 'C1', 'C2', 'C3', 'D1', 'D2', 'D3']
Once you set sample wells, any subsequent methods run with the data object will only use the wells that you set (see example below)
use data.sample_wells to see the currently set samples
data.set_wells_from_samplekey(tip=['Lpd3','SHIP2'])
print(data.sample_wells)
data.pyplot_plot_samples()
['D8', 'D10']
<AxesSubplot:xlabel='time (s)', ylabel='response (nm)'>
Setting the sample_wells does not remove any data from the object, it just designates which wells to use in subsequent methods.
Example - we can still set different wells after we've already ran the cell above:
data.set_wells_by_enumerating('ABC',6)
data.pyplot_plot_samples()
print(data.sample_wells)
['A6', 'B6', 'C6']
data.set_wells_from_samplekey(tip='all', titrant = 'INV')
data.pyplot_plot_samples()
plt.savefig('sig_INV.jpg',dpi=200)
data.set_wells_from_samplekey(tip='empty', titrant = 'INV')
data.pyplot_plot_samples()
plt.savefig('sig_INV_empty.jpg',dpi=200)
use the data.zero_cols() function to baseline normalize the data. Again, this doesn't destroy the original data (data.df), it creates a new normalized dataset (data.baseline_subtracted_data). Baseline normalizing the data is irrelevant for calculating binding curves with the way that we are calculating them, however you sometimes want to normalize for plotting.
data.set_wells_from_samplekey(titrant='INV')
data.zero_cols(280,290)
data.plotly_plot_samples(subtracted_data=True)
EVH1_concentrations = [3.125, 6.25, 12.5, 25, 50, 100]
association_t = 60 #seconds. Length of association step
dissociation_t = association_t # Length of dissociation step
dt=10 # amount of time that you want to average over to get signal
Let's look at this data first. We need to find the time point which we want to define as our first binding signal (at the end of the first association phase). Since the 3 different peptides have different loading times, we will have to process each titration individually, using a different initial offset time for each
data.set_wells_from_samplekey(titrant='EVH1 only')
data.pyplot_plot_samples()
plt.savefig('sig_EVH1_only.jpg',dpi=200)
data.plotly_plot_samples()
Set the first_association_time for the first sample and use the data.binding_signal_preview() function to get a look at how the signals are being calculated
data.set_wells_directly('D8')
first_association_time = 203 # seconds - time that you want to use for the first binding signal
data.set_assay_times(first_association_time, association_t, dissociation_t, dt, EVH1_concentrations)
data.binding_signal_preview()
plt.savefig('signal_areas.jpg',dpi=200)
The binding signal is calculated as:
(average value within the red shaded region) - (ave value within subsequent grey region)
See data.set_assay_times() documentation for more details
binding_curves = data.generate_binding_curves2()
binding_curves
| D8 | |
|---|---|
| 3.125 | 0.256532 |
| 6.25 | 0.36928 |
| 12.5 | 0.505054 |
| 25 | 0.646539 |
| 50 | 0.812717 |
| 100 | 1.01198 |
well = 'D9'
data.set_wells_directly(well)
first_association_time = 225 # seconds - time that you want to use for the first binding signal
data.set_assay_times(first_association_time, association_t, dissociation_t, dt, EVH1_concentrations)
data.binding_signal_preview()
<AxesSubplot:xlabel='time (s)', ylabel='response (nm)'>
binding_curves[well] = data.generate_binding_curves2()
binding_curves
| D8 | D9 | |
|---|---|---|
| 3.125 | 0.256532 | 0.0112163 |
| 6.25 | 0.36928 | 0.0101381 |
| 12.5 | 0.505054 | 0.0178333 |
| 25 | 0.646539 | 0.020731 |
| 50 | 0.812717 | 0.037102 |
| 100 | 1.01198 | 0.0700838 |
well = 'D10'
data.set_wells_directly(well)
first_association_time = 220 # seconds - time that you want to use for the first binding signal
data.set_assay_times(first_association_time, association_t, dissociation_t, dt, EVH1_concentrations)
data.binding_signal_preview()
<AxesSubplot:xlabel='time (s)', ylabel='response (nm)'>
binding_curves[well] = data.generate_binding_curves2()
binding_curves
| D8 | D9 | D10 | |
|---|---|---|---|
| 3.125 | 0.256532 | 0.0112163 | 0.11778 |
| 6.25 | 0.36928 | 0.0101381 | 0.179966 |
| 12.5 | 0.505054 | 0.0178333 | 0.267412 |
| 25 | 0.646539 | 0.020731 | 0.358878 |
| 50 | 0.812717 | 0.037102 | 0.471634 |
| 100 | 1.01198 | 0.0700838 | 0.601491 |
empty_well = 'D9'
pep_wells=b.get_sample_wells(sample_key, tip='all', titrant='EVH1 only')
pep_wells.remove(empty_well)
empty_sig = binding_curves[empty_well]
fits = binding_curves[pep_wells].copy()
for i in pep_wells:
fits[i]=fits[i]-empty_sig
fits = fits.T
fits
| 3.125 | 6.25 | 12.5 | 25 | 50 | 100 | |
|---|---|---|---|---|---|---|
| D8 | 0.245316 | 0.359142 | 0.48722 | 0.625808 | 0.775615 | 0.941894 |
| D10 | 0.106563 | 0.169828 | 0.249579 | 0.338147 | 0.434532 | 0.531407 |
def fit_Kd(y, x, sample_key, plot=True):
def basic_fit(x, init, sat, Kd):
return init + (sat - init) * x / (x + Kd)
gmod = Model(basic_fit, nan_policy='omit')
# print(gmod.param_names)
# print(gmod.independent_vars)
gmod.set_param_hint('Kd', value=0.1, min=0, max=40000)
res = gmod.fit(list(y), x=x, init=0, sat=1)
s = sample_key[sample_key['tip well']==y.name][['tip','titrant']].values[0]
title = '{} - {} and {}'.format(y.name, s[0], s[1])
if plot:
plt.figure()
res.plot_fit(numpoints=10000)
plt.title("{} (Kd = {:.3g})".format(title, res.params['Kd'].value))
plt.ylabel('binding signal - response (nm)')
plt.xlabel('EVH1 concentration (uM)')
plt.tight_layout()
plt.savefig(title+'.jpg',dpi=200)
return {'Kd':res.params['Kd'].value, 'fit SE':res.params['Kd'].stderr}
def apply_fit(df, concentrations, cols):
# df = df.dropna()
temp = df[cols].apply(
fit_Kd,
axis=1,
result_type='expand',
**{'x':concentrations,
'sample_key':sample_key}
)
df = pd.concat([df, temp], axis = 'columns')
return df
cols = [str(i) for i in EVH1_concentrations]
df = apply_fit(fits, EVH1_concentrations, cols)
df = df.reset_index().rename(columns={'index':'tip well'})
df2 = pd.merge(sample_key, df, on='tip well')
findfont: Font family ['sans-serif'] not found. Falling back to DejaVu Sans.
df2
| tip well | tip | titrant | 3.125 | 6.25 | 12.5 | 25 | 50 | 100 | Kd | fit SE | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | D8 | Lpd3 | EVH1 only | 0.245316 | 0.359142 | 0.48722 | 0.625808 | 0.775615 | 0.941894 | 26.228384 | 5.119548 |
| 1 | D10 | SHIP2 | EVH1 only | 0.106563 | 0.169828 | 0.249579 | 0.338147 | 0.434532 | 0.531407 | 26.725968 | 3.600560 |
df2.to_csv('empty_subtracted_binding_curves.csv',index=False)
bc2 = binding_curves.T
bc2 = bc2.reset_index().rename(columns={'index':'tip well'})
bc3 = pd.merge(sample_key, bc2, on='tip well')
bc3.to_csv('unsub_binding_curves.csv',index=False)